Data Science Project: Ethnic Bias of Standardized Testing

By: Mandy Yu, Fall 2021

The institutionalized Standardized Testing of New York State has inaccurately reflected students’ mental capacities. This project explores whether or not there is a correlation between the New York Statewide English Language Arts and Math Exam results and student ethnicity across all school districts. In this project, the ethnic composition of test results will be explored to expose the inaccuracy and unfairness of standardized testing.

Gathering and Cleaning Data

This project explores data on New York Statewide English Language Arts and Math exams from the 2012-13 to 2017-18 school years. In the 2018-19 school year, the statewide exam was changed from three days of testing to two. Thus, data from the 2018-19 school year was omitted. The project also utilizes geographical data to create a mapped visualization of the correlation between exam results and ethnicity.

The NY Statewide ELA and Math Exams are graded on a scale of 1-4 where:

ELA Test Results from 2013-2018 by District

Data obtained from OpenData NYC: https://data.cityofnewyork.us/Education/2013-2019-English-Language-Arts-ELA-Test-Results-S/gu76-8i7h

Math Test Results from 2013-2018 by District

Data obtained from OpenData NYC: https://data.cityofnewyork.us/Education/2013-2019-Math-Test-Results-School-SWD-Ethnicity-G/74ah-8ukf

To effectively work with these number values, we need to convert columns to the appropriate datatypes. This applies to both math and ela dataframes.

Ethnic Demographics of NYC Public Schools from 2013-2018

Data obtained from OpenData NYC: https://data.cityofnewyork.us/Education/2013-2018-Demographic-Snapshot-School/s52a-8aq6

I utilized apply() to extract the district for all schools for the ELA, Math, and Demographic dataframes. This will be used to visualize by district.

NYC Public School Location Data from 2017-18

Data obtained from OpenData NYC: https://data.cityofnewyork.us/Education/2017-2018-School-Locations/p6h4-mpyy

After cleaning the data to extract only the necessary columns and renaming them to consist with the other dataframes, the location column contains coordinates that need to be extracted.

If we take a look closer at one of the entries, it appears to be a Python dictionary. With this, indexing and extracting the columns needed can be done easily.

Analysis

The average mean scale score and average count/percentages of students were calculated for each grade in each district.

ELA Average Test Scores by District and Grade

Math Average Test Scores by District and Grade

The percentage of students who received a satisfactory score (3 or 4) on the statewide exam is reflected in % Level 3+4 column. Let's take a look at the range of values for every grade in all districts.

In both the ELA and math averages, there is a high difference between districts. In the ELA data, one district had 20.78% of their third grade students from 2013-18 receive a satisfactory score of 3 or 4. Another district had 67.36% of their third grade students from 2013-18 receive a 3 or 4. This is nearly a 50% difference. In the math data, one district had 15.15% of their 5th grade students from 2013-18 receive of 3 or 4 and another had 73.85%. That is nearly a 60% difference.

With this information, it is evident there is a difference between districts. In this project, I explore how ethnicity possibly influences the averages. So, let's take a look at each districts' ethnic composition.

First, let's convert the datatypes of the appropriate columns to floats.

Now, let's average the demographics from 2013-18 for each district.

Now, let's join our tables together to be able to plot the data and look for any correlations.

Visualizations

Let's take a close look at the correlation between ethnicity and percentage of students in the district who received a satisfactory grade of 3 or 4.

% Asian vs % Level 3+4

There is a strong positive correlation of 0.751451 between the percentage of Asian students and students with 3 or 4. This indicates that as the percentage of Asian students increases, the percentage of students who receive 3 or 4 also increases. This means the population of Asian students possibly influences the percentage of students who receive a satisfactory score. This correlation suggests that a higher percenteage of Asian students results in more students who receive a 3 or 4.

% Black vs % Level 3+4

There is a weak negative correlation of -0.448200 between the percentage of Black students and students with 3 or 4. This indicates that as the percentage of Black students increases, the percentage of students who receive 3 or 4 decreases. This indicates a possible influence of ethnicity on the percentage of students who receive a satisfactory score. This correlation suggests that a higher percentage of Black students results in less students who receive 3 or 4.

Scatter Plots

% Asian vs % Level 3+4

% Black vs % Level 3+4

% White vs % Level 3+4

% Hispanic vs % Level 3+4

These scatter plots suggest that for minority groups, the schools with higher populations of minority students, the lower the average statewide test score. While this may not be directly influenced by the ethnicity, there is a suggestion that there exists this influence.

Let's take a look at the distribution of grades.